Skip to content

Conversation

@BobMerkus
Copy link

Add Fortran Language Parser Support

Summary

This PR adds Fortran language parsing support to LangChain's LanguageParser, enabling syntax-aware code splitting for Fortran source files.

Changes

  • Implements FortranSegmenter class for parsing Fortran code structure
  • Adds Fortran ('fortran' or 'f90') to supported languages in LanguageParser
  • Updates LANGUAGE_EXTENSIONS and LANGUAGE_SEGMENTERS mappings
  • Enables extraction of top-level functions, subroutines, and modules into separate documents
  • Supports common Fortran file extensions (.f, .f90, .f95, etc.)

Usage

from langchain_community.document_loaders.generic import GenericLoader
from langchain_community.document_loaders.parsers import LanguageParser

loader = GenericLoader.from_filesystem(
    "./fortran_code",
    glob="**/*",
    suffixes=[".f90"],
    parser=LanguageParser(language="fortran")
)
docs = loader.load()

Testing

  • Added unit tests for Fortran parser
  • Verified parsing of functions, subroutines, and modules
  • Tested with various Fortran dialects (F77, F90, F95, F2003)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant